## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.5     ✓ dplyr   1.0.3
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows

1 TDS Group 3

1.1 Inclusion exclusion criteria

Definitions:

  • Cases: Participants diagnosed with lung/bladder cancer after date of attending assessment centre, who were not diagnosed with any other cancer before/on the same day of the lung/bladder cancer diagnosis.
  • Controls: Participants with no prevalent cancer of any type at baseline

*Cancer diagnosis information from HES, coded using ICD10 and ICD9 codes. Additional cancer information at baseline from Self-reported cancers.

1.2 Table 1

## 
## Attaching package: 'flextable'
## The following objects are masked from 'package:kableExtra':
## 
##     as_image, footnote
## The following object is masked from 'package:purrr':
## 
##     compose

1.3 Univariate Analysis

Manhattan

Manhattan

Manhattan

Manhattan

P Values

P Values

P Values

P Values

P Values

P Values

P Values

P Values

P Values

P Values

P Values

P Values

Forest

Forest

Forest

Forest

Forest

Forest

Forest

Forest

1.4 Sensitivity Analysis by Age at Diagnosis and Time to Diagnosis

TO BE COMPLETED
Age at Diagnosis Analysis

Age at Diagnosis Analysis

Time to Diagnosis Analysis

Time to Diagnosis Analysis

1.5 LASSO Logistic Regression

Models:

  • 1 = Base model
  • 2 = Adjusted for smoking

Denoised using linear regression and logistic regression for continuous and categorical variables, respectively. One-hot encoding used for categorical variables with more than 2 levels.

Additionally, models with forced confounders were run to check for any biase in the denoised datasets.

  • 0.1 = Base model adjusted for sex, age and BMI (forced variables)
  • 0.2 = Adjusted for smoking (forced variables)

Four models run for each outcome (lung/bladder cancer):

  • One Stability Selection LASSO logistic regression model was run for each data set.
  • All models calibrated based on binomial deviance using 50% subsample (randomly selected while taking case/control status into account).

1.5.1 Denoised

1.5.1.1 Mean Odds Ratios

Lung
Lung: Mean Odds Ratio

Lung: Mean Odds Ratio

Bladder
Bladder: Mean Odds Ratio

Bladder: Mean Odds Ratio

1.5.1.2 Selection Proportion

Dashed red line threshold = max(pi[base model], pi[adjusted model]) Lung
Lung: Selection Proportion

Lung: Selection Proportion

Bladder
Bladder: Selection Proportion

Bladder: Selection Proportion

1.5.1.3 Prediction Performance

Lung
Lung: Base model AUC

Lung: Base model AUC

Lung: Adjusted model AUC

Lung: Adjusted model AUC

Bladder
Bladder: Base model AUC

Bladder: Base model AUC

Bladder: Adjusted model AUC

Bladder: Adjusted model AUC

1.5.2 Forced (using Penality.Factor)

1.5.2.1 Mean Odds Ratios

Lung
Lung: Mean Odds Ratio

Lung: Mean Odds Ratio

Bladder
Bladder: Mean Odds Ratio

Bladder: Mean Odds Ratio

1.5.2.2 Selection Proportion

Lung
Lung: Selection Proportion

Lung: Selection Proportion

Bladder
Bladder: Selection Proportion

Bladder: Selection Proportion

1.6 sPLS

1.6.1 lung cancer

1.6.1.1 Calibration



Stability analyses for sPLS on lung adjusted for age, sex and BMI

Stability analyses for sPLS on lung adjusted for age, sex and BMI



Stability analyses for sPLS on lung adjusted for age, sex, BMI and smoking

Stability analyses for sPLS on lung adjusted for age, sex, BMI and smoking



1.6.1.2 Stability selection

Lambda = 36, proportion = 0.9
Stability selection for sPLS on lung adjusted for age, sex, and BMI

Stability selection for sPLS on lung adjusted for age, sex, and BMI



Selection proportion for sPLS on lung adjusted for age, sex, and BMI

Selection proportion for sPLS on lung adjusted for age, sex, and BMI



Use results from stability selection for sPLS, lambda = 36

Loading coefficients from sPLS on lung adjusted for age, sex, and BMI

Loading coefficients from sPLS on lung adjusted for age, sex, and BMI



Lambda = 38, proportion = 0.9
Stability selection for sPLS on lung adjusted for age, sex, BMI and smoking

Stability selection for sPLS on lung adjusted for age, sex, BMI and smoking



Selection proportion for sPLS on lung adjusted for age, sex, BMI and smoking

Selection proportion for sPLS on lung adjusted for age, sex, BMI and smoking



Use results from stability selection for sPLS, lambda = 38

Loading coefficients from sPLS on lung adjusted for age, sex, BMI and smoking

Loading coefficients from sPLS on lung adjusted for age, sex, BMI and smoking

1.6.2 bladder cancer

1.6.2.1 Calibration



Stability analyses for sPLS on bladder adjusted for age, sex and BMI

Stability analyses for sPLS on bladder adjusted for age, sex and BMI



Stability analysis for sPLS on bladder adjusted for age, sex, BMI and smoking

Stability analysis for sPLS on bladder adjusted for age, sex, BMI and smoking



1.6.2.2 Stability selection

Lambda = 22, proportion = 0.9
Stability selection for sPLS on bladder adjusted for age, sex, and BMI

Stability selection for sPLS on bladder adjusted for age, sex, and BMI



Selection proportion for sPLS on bladder adjusted for age, sex, and BMI

Selection proportion for sPLS on bladder adjusted for age, sex, and BMI



Use results from stability selection for sPLS, lambda = 22

Loading coefficients from sPLS on bladder adjusted for age, sex, and BMI

Loading coefficients from sPLS on bladder adjusted for age, sex, and BMI



Lambda = 26, proportion = 0.9
Stability selection for sPLS on bladder adjusted for age, sex, BMI and smoking

Stability selection for sPLS on bladder adjusted for age, sex, BMI and smoking



Selection proportion for sPLS on bladder adjusted for age, sex, BMI and smoking

Selection proportion for sPLS on bladder adjusted for age, sex, BMI and smoking



Use results from stability selection for sPLS, lambda = 26

Loading coefficients from sPLS on bladder adjusted for age, sex, BMI and smoking

Loading coefficients from sPLS on bladder adjusted for age, sex, BMI and smoking

1.7 Discussion

1.7.1 LASSO

Lung cancer:

  • Selected variables attenuated after adjustment for smoking:
    • Rented accommodation
    • Coffee ≥4 cups
    • HDL cholesterol
    • Total protein
  • Selected variables strengthened after adjustment for smoking:
    • Average household income (31,000-51,999)
  • Selected for both models
    • High education attainment
    • Average household income (>52,000)
    • Maternal smoking around birth
    • Cardiovascular
    • Respiratory
    • C reactive protein
    • Cholesterol

Bladder cancer:

  • Selected variables attenuated after adjustment for smoking:
    • High education attainment
    • Rented accommodation
    • Parental history of COPD
    • HDL cholesterol
  • Selected variables strengthened after adjustment for smoking:
    • Apolipoprotein A
    • SHBG
    • Testosterone
  • Selected for both:
    • Cholesterol

Key points:

  • Sociodemographic factors associated with lung cancer but not with bladder cancer.
  • Parental history of COPD associated with bladder cancer through smoking.
  • Bladder cancer: Positive association SHBG but negative association with testosterone (?)

Questions

  • Why is LASSO model with forced variables more stringent than denoised model?
  • How to report the three plots?
    • Selection proportion – order by variable groupings? or selection?
    • AUC as a function of number of predictors included in model – overlap models?

1.7.2 sPLS

1.7.3 Plan

Report (Results section) outline:

  1. Descriptive statistics and univariate analysis
    1. Table 1
    2. Manhattan plots
    3. Forest plots
    4. Scatter plots
  2. Multivariate analysis:
    1. LASSO (OR, Selection proportion and AUC)
    2. gPLS, sgPLS (Loading coefficients and Selection proportion)
  3. Targeted analyses by lung cancer subtypes (If the time allows)
    1. Run LASSO (?)
  4. Sensitivity analysis
    1. Stratify by time-to-diagnosis
    2. Stratify by age at diagnosis

Next steps are in bold